625 research outputs found
Memory Vulnerability: A Case for Delaying Error Reporting
To face future reliability challenges, it is necessary to quantify the risk
of error in any part of a computing system. To this goal, the Architectural
Vulnerability Factor (AVF) has long been used for chips. However, this metric
is used for offline characterisation, which is inappropriate for memory. We
survey the literature and formalise one of the metrics used, the Memory
Vulnerability Factor, and extend it to take into account false errors. These
are reported errors which would have no impact on the program if they were
ignored. We measure the False Error Aware MVF (FEA) and related metrics
precisely in a cycle-accurate simulator, and compare them with the effects of
injecting faults in a program's data, in native parallel runs. Our findings
show that MVF and FEA are the only two metrics that are safe to use at runtime,
as they both consistently give an upper bound on the probability of incorrect
program outcome. FEA gives a tighter bound than MVF, and is the metric that
correlates best with the incorrect outcome probability of all considered
metrics
L'Eix del Llobregat al seu pas pel Berguedà : anà lisi comparativa i propostes per a un millor aprofitament dels seus efectes territorials i econòmics sobre la comarca
Centrant-nos en l'Eix del Llobregat i en les seves dues grans infraestructures construïdes dins la comarca del Berguedà , el Túnel del Cadà i l'Autovia C-16, fer un estudi de l'impacte que ha tingut la nova via en les poblacions de la comarca tant a nivell de creixement urbanÃstic com de desenvolupament econòmic, en concret, dins el sector del turisme
El teatre sabadellenc dels últims 30 anys: professionalització i diversificació
Des de 1980 fins avui, el teatre sabadellenc ha evolucionat en dues direccions: la professionalització, amb diferentsgraus i matisos, i la diversificació. Al llarg de les tres últimes dècades, i en lÃnia amb el que ha passat en unà mbit més general, l’activitat escènica local s’ha tornat més rica, han aparegut nous grups, noves generacions deprofessionals i nous espais que han augmentat la complexitat del terreny de joc ‘tradicional’. I, a l’ombra d’aquestestendències, s’han registrat fenòmens com la multiplicació de les opcions de formació, una aposta indiscutiblepel gènere musical i un paper progressivament més discret del sector públic
El uso de las TIG en los barrios informales: una herramienta indispensable de evaluación y planificación. El caso de ESF-Cat en Mozambique
Con la voluntad de mejorar la planificación y evaluación de los proyectos urbanos de ESF-Cat, desde sus inicios el Programa Barrios de
Maputo (Mozambique) se planteó, como una herramienta indispensable, la elaboración de una LÃnea Base de Información a través de las
TIG, y en especial, de los SIG. La futura elaboración del Plan UrbanÃstico del barrio de Maxaquene ‘A’ constata, aún más, la necesidad real de contar con una información de base, organizada y centralizada, y además, pone sobre la mesa la discusión ética y económica del uso de software libre o privativo. En el siguiente artÃculo se expone la evolución y situación actual respecto a la elaboración del SIG urbano,
asà como sus principales problemas y resultados.Peer Reviewe
Prediction of the impact of network switch utilization on application performance via active measurement
Although one of the key characteristics of High Performance Computing (HPC) infrastructures are their fast interconnecting networks, the increasingly large computational capacity of HPC nodes and the subsequent growth of data exchanges between them constitute a potential performance bottleneck. To achieve high performance in parallel executions despite network limitations, application developers require tools to measure their codes’ network utilization and to correlate the network’s communication capacity with the performance of their applications.
This paper presents a new methodology to measure and understand network behavior. The approach is based in two different techniques that inject extra network communication. The first technique aims to measure the fraction of the network that is utilized by a software component (an application or an individual task) to determine the existence and severity of network contention. The second injects large amounts of network traffic to study how applications behave on less capable or fully utilized networks. The measurements obtained by these techniques are combined to predict the performance slowdown suffered by a particular software component when it shares the network with others. Predictions are obtained by considering several training sets that use raw data from the two measurement techniques. The sensitivity of the training set size is evaluated by considering 12 different scenarios. Our results find the optimum training set size to be around 200 training points. When optimal data sets are used, the proposed methodology provides predictions with an average error of 9.6% considering 36 scenarios.With the support of the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Expedient
2013BP_B00243). The research leading to these results has received funding from the European Research Council under the European Union’s 7th FP (FP/2007-2013) /ERC GA n. 321253. Work partially supported
by the Spanish Ministry of Science and Innovation (TIN2012-34557)Peer ReviewedPostprint (author's final draft
Iteration-fusing conjugate gradient
This paper presents the Iteration-Fusing Conjugate Gradient (IFCG) approach which is an evolution of the Conjugate Gradient method that consists in i) letting computations from different iterations to overlap between them and ii) splitting linear algebra kernels into subkernels to increase concurrency and relax data-dependencies. The paper presents two ways of applying the IFCG approach: The IFCG1 algorithm, which aims at hiding the cost of parallel reductions, and the IFCG2 algorithm, which aims at reducing idle time by starting computations as soon as possible. Both IFCG1 and IFCG2 algorithms are two complementary approaches aiming at increasing parallel performance. Extensive numerical experiments are conducted to compare the IFCG1 and IFCG2 numerical stability and performance against four state-of-the-art techniques. By considering a set of representative input matrices, the paper demonstrates that IFCG1 and IFCG2 provide parallel performance improvements up to 42.9% and 41.5% respectively and average improvements of 11.8% and 7.1% with respect to the best state-of-the-art techniques while keeping similar numerical stability properties. Also, this paper provides an evaluation of the IFCG algorithms' sensitivity to system noise and it demonstrates that they run 18.0% faster on average than the best state-of-the-art technique under realistic degrees of system noise.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316) , by Generalitat de Catalunya
(contracts 2014-SGR-1051 and 2014-SGR-1272) and by the IBM/BSC Deep Learning Center Initiative.Peer ReviewedPostprint (author's final draft
Open-Source GEMM Hardware Kernels Generator: Toward Numerically-Tailored Computations
Many scientific computing problems can be reduced to Matrix-Matrix
Multiplications (MMM), making the General Matrix Multiply (GEMM) kernels in the
Basic Linear Algebra Subroutine (BLAS) of interest to the high-performance
computing community. However, these workloads have a wide range of numerical
requirements. Ill-conditioned linear systems require high-precision arithmetic
to ensure correct and reproducible results. In contrast, emerging workloads
such as deep neural networks, which can have millions up to billions of
parameters, have shown resilience to arithmetic tinkering and precision
lowering
The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform
The High-Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High-Performance Computing (HPC) systems. Due to its lower arithmetic intensity and higher memory pressure, HPCG is recognized as a more representative benchmark for data-center and irregular memory access pattern workloads, therefore its popularity and acceptance is raising within the HPC community. As only a small fraction of the reference version of the HPCG benchmark is parallelized with shared memory techniques (OpenMP), we introduce in this report two OpenMP parallelization methods. Due to the increasing importance of Arm architecture in the HPC scenario, we evaluate our HPCG code at scale on a state-of-the-art HPC system based on Cavium ThunderX2 SoC. We consider our work as a contribution to the Arm ecosystem: along with this technical report, we plan in fact to release our code for boosting the tuning of the HPCG benchmark within the Arm community.Postprint (author's final draft
TaskPoint: sampled simulation of task-based programs
Sampled simulation is a mature technique for reducing simulation time of single-threaded programs, but it is not directly applicable to simulation of multi-threaded architectures. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to system noise and variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals we employ a novel fast-forward mechanism for dynamically scheduled programs. We evaluate the proposed technique on a set of 19 task-based parallel benchmarks and two different architectures. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 64 simulated threads by an average factor of 19.1 at an average error of 1.8% and a maximum error of 15.0%.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), the Spanish Ministry of Science and Innovation
(contract TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Mont-Blanc project (EU-FP7-610402 and EU-H2020-671697). M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas is supported by the Ministry of Economy
and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the EUFP7 (contract 2013BP B 00243). T.Grass has been partially
supported by the AGAUR of the Generalitat de Catalunya (grant 2013FI B 0058).Peer ReviewedPostprint (author's final draft
- …